TechPat: Technical Phrase Extraction for Patent Mining
نویسندگان
چکیده
In recent years, due to the explosive growth of patent applications, mining has drawn extensive attention and interest. An important issue is that recognizing technologies contained in patents, which serves as a fundamental preparation for deeper analysis. To this end, article, we make focused study on constructing technology portrait each patent, i.e., recognize technical phrases concerned it, can summarize represent patents from perspective. Along line, critical challenge how analyze unique characteristics illustrate them with definite descriptions. Therefore, first generate detailed descriptions about existing based different criteria, including various previous works, practical experience, statistical analyses. Then, considering complex structure documents, such multi-aspect semantics multi-level relevances, further propose novel unsupervised model, namely TechPat, not only automatically massive but also avoid need expensive human labeling. After that, evaluate extraction results aspects. Specifically, evaluation metric called Information Retrieval Efficiency (IRE) quantify performance extracted new Extensive experiments real-world data demonstrate TechPat model effectively discriminate greatly outperform methods. We apply two application tasks, search classification, where experimental confirm wide prospects phrases. Finally, discuss generalization ability our proposed
منابع مشابه
Extraction of Bilingual Technical Terms for Chinese-Japanese Patent Translation
The translation of patents or scientific papers is a key issue that should be helped by the use of statistical machine translation (SMT). In this paper, we propose a method to improve Chinese–Japanese patent SMT by premarking the training corpus with aligned bilingual multi-word terms. We automatically extract multi-word terms from monolingual corpora by combining statistical and linguistic fil...
متن کاملChallenges for Discontiguous Phrase Extraction
Suggestions are made as to how phrase extraction algorithms should be adapted to handle gapped phrases. Such variable phrases are useful for many purposes, including the characterization of learner texts. The basic problem is that there is a combinatorial explosion of such phrases. Any reasonable program must start by putting the exponentially many phrases into equivalence classes (Yamamoto and...
متن کاملForced Decoding for Phrase Extraction
Forced decoding means using a decoder and an existing phrase table, along with other features of a phrase-based decoding model, to find the most likely phrase alignment between a source text and a reference translation. “Forced” refers to the fact that the decoder is constrained to produce only the reference translation, although not necessarily in only one way. This technique—combined with a l...
متن کاملKey-phrase Extraction for Classification
In this paper we consider the problem of extracting key-phrases from a bilingual texts collection and using them for text classification. A key-phrase could be defined as a sequence of words of a given size in a given partial order that occur within a sentence. We describe an algorithm for the discovery of key-phrases. Then, a framework of handling multilingual texts / documents is described wh...
متن کاملSemi-Automatic Identification of Bilingual Synonymous Technical Terms from Phrase Tables and Parallel Patent Sentences
In the research field of machine translation of patent documents, the issue of acquiring technical term translation equivalent pairs automatically from parallel patent documents is one of those most important. We take an approach of utilizing the phrase table of a state-of-the-art phrase-based statistical machine translation model. In this task, we consider situations where a technical term is ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Knowledge Discovery From Data
سال: 2023
ISSN: ['1556-472X', '1556-4681']
DOI: https://doi.org/10.1145/3596603